Using Likelihood L-statistic as Confidence Measure in Audio-visual Speech Recognition

نویسندگان

  • Arpita Ghosh
  • Ashish Verma
  • Abhinanda Sarkar
چکیده

This paper describes recent work on decision fusion in audio-visual speech recognition. In this work, a novel approach is proposed to combine audio and video channels information in audio-visual speech recognition scenario. For simplicity, we have only considered frame-level phonetic classification problem using two singlestream Gaussian Mixture Model (GMM). Audio and video streams are adaptively weighted using a cumulative mean of the sample confidence values over past frames in addition to the present sample confidence value. The confidence values for audio and video decisions are computed using an L-statistic (linear combination of order-statistic) of the log-likelihoods against phone models. It is shown through various experiments, on a database of about 15000 sentences from large vocabulary continuous speech, that the proposed approach results in better classification accuracy as compared to other approaches.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stream confidence estimation for audio-visual speech recognition

We investigate the use of single modality confidence measures as a means of estimating adaptive, local weights for improved audio-visual automatic speech recognition. We limit our work to the toy problem of audio-visual phonetic classification by means of a two-stream Gaussian mixture model (GMM), where each stream models the class conditional audioor visual-only observation probability, raised...

متن کامل

Using likelihood L-statistics to measure confidence in audio-visual speech recognition

This paper describes recent work on decision fusion in audiovisual speech recognition. In this work, a novel approach is proposed to combine audio and video channel information in audiovisual speech recognition scenario. We have considered framelevel phonetic classification problem using two single-stream Gaussian Mixture Models. Audio and video streams are adaptively weighted using a cumulativ...

متن کامل

Improving visual noise insensitivity in small vocabulary audio visual speech recognition applications

Visual noise insensitivity is important to audio visual speech recognition (AVSR). Visual noise can take on a number of forms such as varying frame rate, occlusion, lighting or speaker variabilities. In this paper the use of a high dimensional secondary classifier on the word likelihood scores from both the audio and video modalities is investigated for the purposes of adaptive fusion. Prelimin...

متن کامل

Correcting Korean vowel speech recognition errors with limited lip features

In the experiment, we evaluate the audio-only and the selected lip feature based visual-only speech recognition performances separately. For audio-only speech recognition, we use HTK3.2 [4], and adopt normalized log likelihood ratio (LLR) scores as the N-best confidence scores. For the visual part, we build a back propagation neural network by using SNNS 4.2 [5] based on the selected lip featur...

متن کامل

Stream weight optimization of speech and lip image sequence for audio-visual speech recognition

Bimodal speech recognition systems, with the use of visual information to supplement acoustic information, have been shown to yield better recognition performance than purely acoustic systems, especially when background noise is present. The early integration strategy for HMM-based audio-visual speech recognition is one promising approach, where the output probability is obtaned by product of o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000